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A MULTISCALE RANDOM FIELD MODEL 
FOR BAYESIAN IMAGE SEGMENTATION 


1 INTRODUCTION 


Background 

Many approaches to Bayesian image segmentation have used maximum a posteriori (MAP) 
estimation in conjunction with Markov random fields (MRF). This approach performs well, but has a 
number of disadvantages: exact MAP estimates cannot be computed; approximate MAP estimates are 
computationally expensive to calculate; and unsupervised parameter estimation of the MRF is difficult. 

This study developed a new approach to Bayesian image segmentation that directly addresses these 
problems. The new method replaces the MRF model with a novel multiscale random field (MSRF), and 
replaces the MAP estimator with a sequential MAP (SMAP) estimator derived from a novel estimation 
criteria. Together, the proposed estimator and model result in a noniterative segmentation algorithm that 
can be computed in time proportional to MN where M is the number of classes and N is the number of 
pixels. The classes are modeled as a Gaussian mixture model. Given sample pixels for each class, an 
unsupervised, computationally efficient method was developed both for estimating the number of 
subclasses in each class as well as the parameters for each subclass. 

This multispectral SMAP algorithm and the multimodal mixed signature model described in this 
report have been implemented in version 4.1 of the Geographic Resources Analysis Support System 
(GRASS). GRASS is a public domain geographic information system (GIS) and image processing system 
originally developed by researchers at the U.S. Army Construction Engineering Research Laboratories 
(USACERL), in Champaign, IL. GRASS is used to input, display, analyze, and output geographic data 
by users in both military and nonmilitary, and public and private agencies, based all over the world. 
GRASS is also a component of the Integrated Training Area Management (ITAM) program implemented 
by USACERL to assist in the management of Army training lands. 

The ITAM program elements include inventory and monitoring of Army installation natural 
resources, integration of land management and training missions, training area rehabilitation, and 
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environmental awareness training for military personnel. The collection of ground-level data for natural 
resource inventory and evaluation is conducted through the Land Condition Trend Analysis (LCTA) 
element of the ITAM program (Tazik et al. 1992). Both space-based remotely-sensed data and 
ground-level data are collected for Army installations where the ITAM program is implemented. A 
responsibility to inventory and monitor installation natural resources has led to the development of the 
LCTA program. Research to support the LCTA program helps develop and apply available technologies 
to enhance natural resource stewardship on U.S. Army lands. Coupling remotely-sensed digital data with 
traditional ecological ground data has been identified as a technology with a potential to improve land 
managers’ capabilities to inventory and monitor Army installation natural resources. LCTA data sets were 
used to test the image segmentation algorithms described in this report. 


Objective 

The objective of this study was to develop a new approach for processing remotely-sensed imagery 
using an advanced image segmentation algorithm that contextually smooths or filters digital data so that 
the classified polygons better model the ecological units on the ground. This algorithm was designed as 
an analysis tool to improve the ability to use ground-truthed LCTA data to verify remotely sensed 
electronic data. 


Approach 

Literature on existing approaches to Bayesian image segmentation was reviewed. A new 
contextual-based multispectral image processing procedure was defined, developed, and implemented in 
the GRASS using a multimodal mixed Gaussian distribution to represent spectral signatures for classes. 
The accuracy of the procedure was tested with data for areas where both ground-truth data and 
multispectral remotely-sensed imagery were already available. 


Scope 

The algorithms developed in this study and programs that implement these algorithms are part of 
the 4.1 release of GRASS, and may not be accurate for post-4.1 releases of GRASS. 
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Mode of Technology Transfer 


The information in this report will be transferred to the field through its implementation in GRASS. 
GRASS technology is disseminated through training programs. Cooperative Research and Development 
Agreements, User Conferences, and the GRASSClippings quarterly newsletter, the GRASS Support Center, 
extensive documentation, institutional structures at Army and Interagency levels, communication networks, 
among other forums. 

User feedback on the algorithms described in this report and their implementation in GRASS is 
desirable and important to the development of GRASS. Users are encouraged to provide such feedback 
to the GRASS development staff at USACERL via electronic mail grassbug@zorro.cecer.army.mil, and 
through the GRASS Information Center, U.S. Army Construction Engineering Research Laboratories 
(USACERL), P.O. Box 9005. Champaign, IL 61826-9005, Phone 217/373-7220, FAX 217/373-7222. 
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2 TECHNICAL OVERVIEW OF BAYESIAN IMAGE SEGMENTATION 


Haralick and Shapiro (1985) have suggested that a good segmentation of an image should separate 
the image into simple regions with homogeneous behavior. In recent years, many authors have used 
Bayesian estimation techniques as a framework for computing segmentations that best compromise 
between these two opposing objectives (Therrien 1983; Besag 1986; Derin and Elliott 1987). These 
methods model the shape of segmented regions in addition to the behavior of pixels in each homogeneous 
region. The segmentation is then computed by estimating the best label for each pixel. 

A number of estimation techniques and region models have been used for the Bayesian segmentation 
problem. Typically, the labels of image pixels are modeled as a Markov random field (MRF) or 
equivalent Gibbs distributions (Besag 1974). These models are used because they only require the 
specification of spatially local interactions using a set of local parameters. This is important since spatially 
local interactions result in segmentation algorithms that only require local computations. Most often, the 
image is then segmented by approximately computing the maximum a posteriori (MAP) estimate of the 
pixel labels. 

These statistical approaches to segmentation provide an important framework, and have improved 
results in the application of segmentation to natural scenes (Pappas 1972), tomographic cross sections 
(Sauer and Bouman 1993), texture images (Therrien 1983), and multispectral remotely sensed images 
(Zhang, Haralick, and Campbell 1990; Jeon and Landgrebe 1991; Kato, Zerubia, and Berthod 1992; and 
Jeon and Landgrebe 1993). However, the approach has a number of important disadvantages. 

Computing the MAP estimate requires the minimization of a discrete functional with many local 
minima. Exact minimization is intractable, so the true MAP estimate must be approximately minimized. 
Methods for approximation include simulated annealing (Geman and Geman 1984), greedy minimization 
(Besag 1986), dynamic programming (Derin and Elliott 1987), and multiresolution minimization (Gidas 
1989; Bouman and Liu 1988; Bouman and Liu 1989; and Perez and Heitz 1992). However, all of these 
approaches require approximations in the two dimensional case, and are either iterative or computationally 
expensive. 

The MRF model has a limited ability to describe large-scale behaviors. For example, even though 
segmented regions are likely to be at least 50 pixels wide, it is difficult to accurately incorporate this 


8 







information by specifying the interactions of adjacent pixels. The model can be improved by using a 
larger neighborhood for each pixel, but this rapidly increases the number of parameters of interaction, and 
the complexity of the segmentation algorithms. The fundamental limitation of local models is that they 
do not allow behavior to be directly controlled at different spatial scales. This is of critical importance 
since scale variation occurs naturally in images, and is important in quantifying image behavior (Pentland 
1984; and Peleg et al. 1984). 

The MAP estimate does not have desirable properties for the segmentation problem (Marroquin, 
Mitter, and Poggio 1987; Dubes et al. 1990). The MAP estimate minimizes the probability that any pixel 
in the image will be misclassified. This is an excessively conservative criteria since any segmentation 
algorithm is likely to result in some misclassified pixels. In practice, it has been noted that MAP 
estimation has some undesi 'le global properties that may actually make an approximate minimization 
more desirable (Besag 1986; Dubes et al. 1990). For example, in multiresolution segmentation, MRF 
correlations parameters were found to increase at coarser scales (Bouman and Liu, 1988, 1991). This is 
counter to the physical intuition that coarser sampling should produce less correlation. 

The maximizer of the posterior marginals (MPM) estimator has been suggested as an alternative to 
MAP estimation, since it minimizes the probability of classification error (Marroquin, Mitter, and Poggio 
1987). However, it may only be approximately calculated in a computationally expensive procedure 
similar to simulated annealing. Also, the MPM criteria does not consider the spatial placement of errors 
when distinguishing among the quality of segmentations. 

Finally, parameter estimation of MRFs is difficult. When parameters are above the “critical 
temperature,” there may be no consistent estimator as the image size grows to infinity (Pickard 1987). 
Methods have been developed to estimate MRF parameters from images being segmented (Lakshmanan 
and Derin (1989), but they are computationally expensive. 

This study addressed these difficulties by introducing a new approach to Bayesian image 
segmentation. This method replaces the MRF model with a novel multiscale random field (MSRF), and 
replaces the MAP estimator with a sequential MAP (SMAP) estimator derived from a new estimation 
criteria. Together, the proposed estimator and model result in a noniterative segmentation algorithm that 
can be computed in time proportional to MN where M is the number of classes and N is the number of 
pixels. A method for estimating the parameters of the MSRF model directly from the image during the 
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segmentation process is also developed. This allows images with differing region sizes to be segmented 
accurately without specific prior knowledge of their behavior. 

This MSRF model is composed of a series of random fields progressing from coarse to fine scale. 
Each field is assumed to only depend on the previous coarser field. Therefore, the series of fields form 
a Markov chain in scale or resolution. Further, this assumes that points in each field are conditionally 
independent given their coarser scale neighbors. This leads to a rich model with computationally tractable 
properties. In fact, Luettgen, Karl, Willsky and Tenney (1993) have shown in independent work that 
models similar to the MSRF actually include MRFs as a subclass. 

Earlier work (Chou et al 1989; Chou, Golden, and Willsky 1991; Basseville et al. 1992; Basseville, 
Benveniste, and Willsky 1992a, 1992b) has shown that Markov Chains in scale can be used to model 
continuously valued Gaussian processes in one and two dimensions. This work has resulted in fast 
algorithms for problems such as optical flow estimation (Chou et al. 1991). Estimation for these models 
is performed using a generalization of Kalman filtering (Basseville et al. 1992; Basseville, Benveniste, and 
Willsky 1992a, 1992b). This approach is ideal for Gaussian models since the MAP, conditional mean, 
and minimum mean squared estimates coincide, and may be computed using only recursively computed 
first and second order statistics. However, since this model requires discrete value:, to represent pixel 
classes, these methods are not applicable. 

The MSRF model has a number of advantages over fixed scale MRFs. The Markov chain structure 
facilitates straightforward methods for parameter estimation since it eliminates difficulties with intractable 
normalizing constants (partition functions) found in MRFs. Yet the model does not impose an unnatural 
spatial ordering to the pixels since the Markov chain is in scale. Also, since explicit parameters are 
available to control both coarse and fine scale behavior, the MSRF model can more accurately describe 
image behavior. 

The SMAP estimation method results from minimizing the expected size of the largest misclassified 
region. This is accomplished by assigning progressively larger cost to errors at a coarser scale. 
Intuitively, the criteria accounts for the fact that an error at coarse scale is more serious since it causes 
the misclassification of many pixels. The SMAP criteria results in a series of optimization steps going 
from a coarse to fine scale. At each scale, the best segmentation is computed given the previous coarser 
segmentation and the observed data. Each maximization step is computationally simple and noniterative 
if the region parameters are known. The complete procedure is similar to pyramidal pixel linking 
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procedures (Burt, Hong, and Rosenfeld 1981; Antonisse 1982), but requires local computations much like 
those used in Bayesian networks (Pearl 1988). 

Unknown region parameters may be estimated using an iterative procedure at each scale. This 
iterative procedure, based on the expectation maximization (EM) algorithm (Dempster, Laird, and Rubin 
1977), is implemented by subsampling the image. Therefore, parameter estimation only increases the 
required computation by approximately a factor of two. 







3 MULTISCALE SEGMENTATION APPROACH 


The random field Y is the image that must be segmented into regions of distinct statistical behavior. 
(Upper case letters denote random quantities, while lower case letters denote the corresponding 
deterministic realizations.) Individual pixels in Y are denoted by Y s where s is a member of a two 
dimensional lattice of points S. 

The basis of this segmentation approach is a hierarchical or doubly stochastic model (Figure 1). 
This model assumes that the behavior of each observed pixel depends on a corresponding unobserved label 
pixel in X. Each label specifies one of M possible classes, each with its own statistical behavior. The 
dependence of observed pixels on their labels is specified through p yj Jy /x), the conditional distribution 
of Y given X. Prior knowledge about the size and shapes of regions will be modeled by the prior 
distribution p(x). 

Since a variety of features can be used with this approach, it is a general framework for the 
segmentation problem. For the texture segmentation problem, a stochastic texture model can be used for 
PylJy/x) (Derin and Elliott 1987; Manjunath, Simchony, and Chellappa 1990; Bouman and Liu 1991), 
or texture feature vectors can be extracted at each pixel (Laws 1980; Unser and Eden 1989, 1990) and 
modeled with a multivariate distribution. However, segmentation of multispectral remotely sensed images 
will serve as the target application for the examples. In this case, a multivariate Gaussian mixture 
distribution will be used to describe the behavior of the spectral components of the image. Appendix D 
describes the method used for estimating the parameters for this mixture model. 



Y - Observed image containing 
four distinct textures 


X - Unobserved field containing 
the class of each pixel 


Figure 1. Structure of a Doubly Stochastic Random Field Used in Segmentation. 
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The following sections describe the general structure of a MSRF model for p(x) and develop a 
sequential MAP estimation approach for computing the best segmentation. Detailed models and recursion 
formulas resulting from this framework are then derived in Chapter 4. 


Multiscale Random Field Model 

The multiscale Random Field (MSRF) model is composed of a series of random fields at varying 
scales or resolutions. Figure 2 shows the pyramid structure of the MSRF. At each scale, n, the 
segmentation or labeling is denoted by the random field X* 1 , and the set of lattice points is denoted by 
S ,K> . In particular, X 0> is assumed to be the finest scale random field with each point corresponding to a 
single image pixel. Each label at the next coarser scale, X'\ then corresponds to a group of four points 
in the original image. Therefore, the number of points in S 1 '* is one-fourth the number of points in S l0> . 

The fundamental assumption of the MSRF model is that the sequence of random fields from coarse 
to fine scale form a Markov chain. Therefore, the distribution of X n> given all coarser scale fields only 
depends on X 01 * 0 . This is a reasonable assumption since X fn+1) should contain all the relevant information 
from previous coarser scales. Formally, this Markov chain relation may be stated as: 

P(X {n) = * (,,) |X (,) = x®l>n) = P(X (B) = JC (B) IX ( " + 1) = x ( " + 1) ) 

[Eq 1] 



Figure 2. Pyramid Structure of the MSRF. 
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Correspondingly, the exclusive dependence of Y on X 0} implies that: 


P(yedy|X ( ">n>0) = P(Yedy\X (0) ) 

= Py\x«dy I*®) 


The joint distribution of X and Y may then be expressed as the product of these distributions: 


P(Yedy,X=x) = p yj , ( o>Cy|jc (0) ) 


L -1 


n=0 




It, 


where L is the coarsest scale in X. This Markov structure in scale has the isotropic behavior associated 
with MRFs, but in addition, the causal dependence in scale results in a noniterative segmentation algorithm 
and direct methods of parameter estimation. 


Sequential MAP Estimation 

To segment the image, Y, the pixel labels in X must be accurately estimated. Bayesian estimation 
techniques are the natural approach since the existence of a prior distribution p(x) is assumed. Generally 
Bayesian estimators attempt to minimize the average cost of an erroneous segmentation by solving the 
optimization problem: 


x = argmin£[C (X,x) | Y = y] [Eq4] 

X 

where C(Xjc) is the cost of estimating the true segmentation, X, by the approximate segmentation, x. 
Notice that X is a random quantity whereas x is a deterministic argument. Of course, the choice of the 
functional, C( - , -), is of critical importance since it determines the relative importance of errors. 
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A look at the assumptions of its derivation will help explain the deficiencies of the MAP estimate. 
TTie MAP estimate is the solution to [Eq 4] when the cost functional is given by: 

C iWAP&’X) = 1 -6(X -JC) [Eq 5] 

where 8(X-x) is 1 when X = x and 0 otherwise. Since C MAF (Xpc) = 1 whenever any pixel is incorrectly 
labeled, the MAP estimate maximizes the probability that all pixels will be correctly labeled. Of course, 
a segmentation need not be completely accurate at all pixels to be useful. Even good segmentations will 
normally have erroneously classified pixels along region boundaries. This is particularly true in high 
resolution images where the misclassification of a single pixel is not significant. Therefore, the MAP 
estimate can be excessively conservative (Marroquin, Mitter, and Poggio 1987; Dubes et al. 1990). 

The implications of the MAP criteria appear even more inappropriate for the estimation of the 
MSRF introduced in the previous sections. The cost function used for MAP estimation of a MSRF is: 

<W(X, x) = 1 -6(X-x) 

[Eq 6] 

= l-fl 6(X<">-*<">) 

n-0 


This cost function is 1 if a labeling error occurs at any scale, n, of the segmentation. Consequently, this 
function assigns equal cost to a single mislabeled pixel at n=0 or the mislabeling of approximately 256 
pixels at n=4. This cost assignment is clearly undesirable. 

Ideally, a desirable cost function should assign progressively greater cost to segmentations with 
larger regions of misclassified pixels. To achieve this goal, the following alternative cost function is 
proposed: 


'SMAP (X> x ) = ~ 


L 


+ E 2 Hl C n (X,x) 

n =0 


[Eq 7] 
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[Eq 8] 


where 

CJX.x) = 1 - n t(X m -x a ) 

i = H 


The behavior of C SMAP is solely a function of the coarsest scale, K, that contains a misclassified pixel. 
More precisely, let K be the unique scale such that X K> * xf K> , but X (i> = xf' 1 for all i>K. Then the functions 
C„ are given by : 


C H (X,x) 


lifnzK 

Qifn>K 


[Eq 9] 


and the total cost is given by C SMAP (X, x) = 2*. This error at scale K will generally lead to the 
misclassification of a group of pixels at the finest scale. The width of this misclassified group of pixels 
will be approximately 2 K - C SklAP (X, x). Therefore, the SMAP cost function has the following intuitive 
interpretation. 


C SMAP {X, jc) « width of the largest grouping of misclassified pixels [Eq 10] 


The estimator that minimizes this proposed cost is determined by evaluating [Eq 4]. 


x = argminE[C SMAP (X,x)|Y=y] 


= argmin 2 n ' 1 { 1 -P(X® = x (i) i | Y = y)} 

* n=0 

L 

= arg max53 2“ P(X® = x (i) Un | Y = y) 

* n=0 


[Eq 11] 


Since the random fields, X!*\ form a Markov Chain, this estimate is computed recursively in the scale 
parameter n is done by assuming that x (i) has been computed for i>n, and using this result to compute 
Appendix A shows that this recursive approach yields the following expression for the solution: 


# n) = arg max {logp^y,.^ (jr 00 !#"* 1 *, y)+ %{x in) )} 


[Eq 12] 







[Eq 13] 


where ^Tis a second order term which may be bounded by 

0 <; &(x {n) ) z maxp x (.-,).(,,) < < 1 

x<-'> 

The approximation that Sk< 1 is very good. To see this, notice that xf"' 11 is an interpolation of the coarser 
segmentation, x (n) , given the image, y. Normally, there will be many pixels in the interpolation x!" h for 
which the correct labeling is uncertain. This is particularly true around the boundaries of objects. Since 
the number of unique labeling combinations for these pixels is enormous, the probability of any particular 
combination will be small. In fact for these models, this probability goes to 0 as the number of pixels, 
N, increases. Therefore: 

lim W (JC (B) > =0 [Eq 14] 

N 


At very coarse scales, the number of labels becomes small, and often only one reasonable interpolation 
will exist (i.e., 1). However, in this case the correct labeling of pixels at the coarser scale, n, must 

also be unambiguous, and any reasonable estimator should have good performance. Ignoring the 
contribution of 9 results in the following recursive equations: 

^ = arg max log p « . (x w | y) [Eq 15] 

x (n) = argmaxlogp x (.) U(B M) (x (n) |x (n+1) ,y> [Eq 16] 

x« 

The recursion is initialized by determining the MAP estimate of the coarsest scale field given the observed 
data. The segmentation at each finer scale is then found by computing the MAP estimate of X n) given 
i (iul) and the image, y. Due to this structure, this estimator is referred to as a sequential MAP (SMAP) 
estimator. 

By using Bayes rule, the Markov properties of X, and assuming that X L> is uniformly distributed, 
the SMAP recursion may be rewritten in a form that is more easily computed: 

x (L) = arg max logo , m (y | x {L) ) [Eq 17] 

*<"> 
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x (n) = arg max {log p ylJcM (y |* (B) ) + lQgP Jt «| Jc fr»i>(* M |i (n + 1) )} [Eq 18] 

jfW 

The two terms in [Eq 18] play roles analogous to those of the likelihood function and prior distribution 
in conventional Bayesian estimation. The first term of the maximization gives the likelihood of the 
observed data y given the labeling at scale n. The second term of the maximization embodies the a priori 
information about the behavior of X. Therefore, this term biases the solution to favor segmentations with 
large regions and smooth boundaries. 

The SMAP estimator has a number of additional advantages. Chapter 4 introduces specific models 
so that each optimization step of [Eq 18] may be computed with a single noniterative pass. This is in 
contrast to the MAP and MPM estimators, which require computationally expensive iterative optimization 
methods (Geman and Geman 1984; Besag 1983; Marroquin, Mitter, and Poggio 1987). Further, the 
SMAP estimator has a subtle advantage over MPM. The MPM method chooses each pixel individually 
to minimize the probability of error. However, it does not consider the spatial placement of errors. Since 
the SMAP method attempts to minimize the spatial size of errors, it will tend to generate a subjectively 
more desirable segmentation. 
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4 SEGMENTATION ALGORITHM 


The specific models used for Y and X are defined by specifying the conditional density p lxm (y \ x (0> ), 

together with the coarse to fine scale transition densities p xMlxM ,( jt ( ' ,) ljr , "*' ) ). An adaptive segmentation 

algorithm that estimates the parameters of the MSRF during the segmentation process can then be 
developed. 

For the multispectral segmentation problem, it is best to use models with observed pixels that are 
conditionally independent given their class labels. This implies that the spatial texture of regions will not 
be used as a discriminating feature. Instead, the multispectral characteristics of classes help to discriminate 
distinct regions. This approach is supported by the fact that spatial correlation has been found to be weak 
within regions of multispectral images corresponding to a single ground cover (Landgrebe 1980; Kettig 
and Landgrebe 1976). Using this assumption, the conditional density function for the image has the form: 


/V°)04* (0) ) = II Py± m W*i 0) ) [Eq 19] 

ses<*> ‘ 


where '|k) is the conditional density function for an individual pixel given the class label k. Since 
each pixel is composed of D multispectral components, P y ^(' | A) is a multivariate density function. Here 

a multivariate Gaussian mixture density is used in numerical experiments, but other distributions can just 
as well be applied. 

More generally, the methods used throughout this study are applicable to any model that can be 
expressed in the form: 


log/^wCyi* (0) ) = £ l s (M*r) + c(y) [Eq 20] 

seS m 

where the functions l, depend on all of y, and c is an arbitrary function of y. This type of model has been 
used extensively in texture segmentation applications (Derin and Elliott 1987; Derin and Cole 1986; 
Manjunath, Simchony, and Cheliappa 1990; Bauman and Liu 1991). 
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The choice of models for X was restricted to have two important properties. First, the pixels in X 1 " > 

were conditionally independent given the pixels in X 1 ** 11 . Second, each pixel X s M was only dependent on 

a local neighborhood of pixels at the next coarser scale. This set of neighboring locations to s is denoted 
by ds. Based on these properties, the transition distribution from coarse to fine scale takes the form: 

P x m 1x m(x ( h) | X (B+1) ) = U p x i*) lx c.*i) (*j n) | j 4" +1) ) [Eq 21 ] 


where p x c .,^.<. is the probability density for x x H) given its neighbors at the coarser scale, These two 

assumptions assure that the pixels in xf K> will be conditionally independent given the pixels in 
However, note that nearby pixels in xf" 1 may still be highly dependent since they will share coarser scale 
neighbors. 

The choice of neighborhood 3$ for the pixel s is important since it will define the structure of the 
multiscale pyramid. Two types of neighborhoods are used. The first corresponds to a quadtree structure 
and allows simple and exact calculation of the SMAP segmentation. The second neighborhood 
corresponds to a more general graph structure and can account for more complex interactions that occur 
across blocks of the quadtree. Therefore, this model produces smoother, less blocky segmentations. 
Unfortunately, the graph structured pyramid does not allow exact calculation of the SMAP estimator. 
Therefore, a hybrid model, which incorporates both the quadtree and graph structure, is used to 
approximate the exact SMAP estimate. 


Quadtree Model 

The first pyramid structure considered is based on a conventional quadtree. Figure 3a shows the 
structure of the quadtree, and the one dimensional analog is shown in Figure 3b. Since this is a tree 
structure, each point in the pyramid depends only on a single point at the coarser scale. This coarser scale 
neighbor of a point s is called the father of s and will be denoted by the function d(s). In a similar vein, 
the n ,h successive father of a point will be denoted d"(s), and d~"(s) will be the set of all points that occur 
n levels down from s in the tree structure. 
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(a) ( b ) 

Figure 3. (a) Quadtree Structure Used for MSRF Model; (b) One-Dimensional Analog to Quadtree 
Structure. 


The following transition function models the probability that -X^has class m, given that its father 
is of class k: 

p,» i,r>o» i *) - M-.* +|Eq 221 

where 8*. is the unit sample function and M is the number of possible classes. The parameter 0 nO e [0,1] 
is the probability that the labeling will remain the same from scale n +1 to n. If a class change does occur, 
it is equally likely to be any one of the remaining class types. At fine resolutions, the neighboring pixels 
are more likely to have the same class. Therefore, will generally be an increasing function of 
resolution (decreasing function of n). Notice that this distribution only depends on the the scale n through 
but does not depend on the particular pixel s. 

An important property of the quadtree structure is that the conditional distribution of Y given Xf H> 
has a product form that may be computed using a simple fine-to-coarse recursion. Let y/"’ be the set of 

leaves of the quadtree that are on the branch starting at seS 1 "’. Specifically, y, <B> ={y r : d\r) = . 

Appendix B shows that the conditional density for Y given X t " > has the product form: 

P y \ x v(y l* (B) ) = II [Eq 23] 

seS ( "> 
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Furthermore, the density functions may be computed using the following recursion where M is the 

number of class labels: 

= II E P y?\ x ?(yf\ m ) P x l*\/*"Am\k) [Eq 24] 

red' l (s) m * * 

In practice, dynamic range considerations mandate that the logarithm of these functions be computed and 
stored. Therefore, define the log likelihood function as: 

4 w (*) 4 togp,»v»'0'i'' > l*> [Eq 25] 

where the dependence on y is suppressed for clarity. Substituting the transition distribution of {Eq 22J 
into [Eq 24] and converting to log likelihood functions yields the new recursion: 

4 (0) (*) = log p y> (Jt <® (y s I *) [Eq 261 

ir°«) - E logje^Mpl^W) *1^2 E explfwi) lEq27| 

re4'*(s) M "“1 

To perform the SMAP segmentation, these log likelihood functions must be computed at every point 
in the quadtree. The number of points in the pyramid is approximately given by ^, where N is the 

number of pixels at the Finest scale. Since each of the log likelihood functions may be stored in the form 
of M numbers, the total required storage is approximately Because the second sum of [Eq 27] 

is not a function of k, each log likelihood function can be computed in time proportional to M. 
Therefore, the total computation time for evaluating the log likelihood functions is O (MN). 


Py(-» {x <"»(y { s n * l) \k) 
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Note that in the limiting case of Q nj[ Jk) = 1, the recursion reduces to simple averaging: 

l ( s n ' l \k) = 52 [Eq 28] 

re<r‘(s) 

This is equivalent to assuming that all the pixels in the block of pixels, must have the same label. 

Of course, this is often not the case since some of the pixel blocks are likely to fall on region boundaries. 
These blocks then have a mixture of pixels from the two regions. Therefore, simple averaging can cause 
pixels on region boundaries to be misclassified. This is particularly true if the statistical average of the 
two regions appears to have characteristics of a third class. Another advantage of the more accurate 
recursion, [Eq 27], is that a small group of anomalous pixels will not adversely affect the classification 
of a large region. Simple linear averaging of the log likelihood function tends to be easily biased by a 
few pixels that are statistical outliers, whereas the more accurate recursion will tolerate such errors. 

Once the likelihood functions are computed, the SMAP segmentation may be efficiently computed 
using [Eq 18]: 


i (B) = argmax J] {/j'Vj' 0 ) + togP x «irf-*i>(*j W I*** 0 )} [Eq29] 

<<"> «s<«> 1 ' * J 

This expression is easily evaluated by minimizing with respect to each pixel individually: 



arg 


max (k) + logo *d 

1 skiM' * 1 * 



[Eq 30] 


This coarse-to-fine segmentation operation requires order O(MN) computation time. Therefore, the 
complete segmentation process consists of a fine-to-coarse and a coarse-to-fine operation, each of which 
require O(MN) computation. 


While the quadtree model results in an exact exprjssion for computing the SMAP segmentation, 
it does not completely capture some aspects of image behavior. The following section introduces an 
augmented model that improves on the quadtree. 

I 
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Pyramidal Graph Model 


An important disadvantage of the quadtree model is that spatially adjacent pixels may not have 
common neighbors at the next coarser scale. Therefore, the model does not enforce continuity of region 
boundaries when they pass across branches of the quadtree. For example, if the image is broken into four 
quadrants at the second level of the quadtree, then a region boundary will not be constrained to be smooth 
as it passes from one of these quadrants to the next. 

This weakness may be corrected by increasing the number of coarse scale neighbors for each pixel. 
Figure 4a shows such a pyramidal graph structure and Figure 4b shows an analogous one dimensional 
graph. Notice that each point has three neighbors at the next coarser scale. 

To express the positions of these neighbors, a pixel was explicitly denoted at scale n as s=(i, j), 
where i and j are the row and column indices starting at 0 and ranging through the width minus one, and 
the height minus one respectively. The three neighbors at scale n+1 may then be computed using the 
function odd(i)=\ if i is odd and -1 if j is even, and the greatest smaller integer function |_-J: 

s, = (LV2J, Ljr/2J) 

s 2 = (U/2U.//2J) +Mrf(i),0) [Eq 31] 

s, = mwb+(.o,<mj)) 



(a) (b) 

Figure 4. (a) Augmented Pyramidal Graph Structure Used for MSRF Model; (b) One-Dimensional 
Analog of Pyramidal Graph Structure. 
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The transition function chosen for this pyramid graph is a natural extension of the transition function used 
for the quadtree based model: 


p x ( .), | i,j,k) 

S 1 <S 

= P(Xf = m|x£ +1) = i,x£ +1) =j,X^ l) =k) 


e . l-e^, 

= —(3fi M ., + 26 m , + 26 lllt ) +-— 

q v **•* mj in,*' 


[Eq 321 


The notation p is used to distinguish from the transition distribution used for the quadtree model. 
As with the quadtree model, the parameter 0,, € [0,1] determines the probability that the label of the fine 
scale point will be the same as one of the coarser scale points. Conversely, 1 - 6„, is the probability that 
a new label will be randomly chosen from the available labels. 


The disadvantage of the pyramid graph structure is that the likelihood function for the labels does 
not have a product form as was the case for the quadtree in [Eq 23]. Therefore, there is no simple 
fine-to-coarse recursion with the form of [Eq 24] fear computing the likelihood of the image y given the 
labels xf*>. 


For computations at a single scale, n, this problem may be circumvented by assuming that the 
pyramid has a quadtree structure for scales finer than n, and a graph structure for coarser scales. Figure 5 
shows a one-dimensional analog to this hybrid pyramid structure for n=2. Notice that, for levels above 
n, the pyramid is a graph, but below n the pyramid has a simple tree structure. Using this hybrid 
pyramid, the conditional likelihood of [Eq 18] then has the computable form: 


lOgPy.jtW |x ("*'>Cy» x (B) I -* ( " +1) ) = £ 

seS w 


ls n X4 H) ) + tog p*<"> ,,<»♦»(* 


(*) 


a(« + !K 
x as ) 


[Eq 33] 


which results in the following formula for the SMAP estimate of X 1 " 1 . 


if = arg max lf(*) 

l iksM 


[Eq 34] 
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Figure 5. One-Dimensional Analog to Hybrid Graph Structure, Where Scales n >2 Use a Pyramidal 
Graph Structure, and Scales ni2 Use a Quadtree Structure. 

xf = arg max [if (k) + logp x «, <»*«)(* | +1} )l [Eq 35] 

lsJtsA^ * 1 * J 

where p is the transition function of [Eq 32], and lf(k) are computed using the recursion of [Eq 27]. 

Of course, the application of the above formula at all scales is an approximation to the true SMAP 
segmentation since the model may not legitimately be changed during the segmentation process. However, 
this approximation is reasonable since the likelihood functions are primarily dependent on the image data 

and have only a secondary dependence on the pyramid structure. Intuitively, if the pixels in y/" 1 appear 

to be principally from class k, then the likelihood function l^\k) should be relatively large regardless of 
the pyramid structure used. 

All the following analysis and experimentation will assume the use of this hybrid quadtree-graph 
structure. As in the case of the quadtree, segmentation using the hybrid structure is done in two steps, 
each of which only requires order MN computation. The first step is a fine-to-coarse computation given 
by [Eq 27]. The second step is a coarse-to-fine computation given by [Eq 35]. Together the total 
computation is of order O(MN). 
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Parameter Estimation 


In typical applications, one does not have prior information about the exact behavior of the 
segmentation X. However, it is possible to determine this information directly from the image as it is 
being segmented by estimating the parameters 0, = [0 bB , 0 Bl ] during the segmentation process. 

The parameters 0„ o are required for fine-to-coarse operations used in computing the log likelihood 
functions, and the parameters 0 Bl are required for coarse-to-fine operations used in the SMAP 
segmentation. However, both of these parameter vectors will be estimated during the coarse-to-fine 
operations. This means that the segmentation process will require two full passes composed of the 
following steps: 

1. Perform fine-to-coarse operations using initial parameter values 0„ o = 1. 

2. Estimate parameters 0 BjO and 0„, during coarse-to-fine operations. 

3. Perform fine-to-coarse operations using estimated parameters. 

4. Re-estimate parameters 0„, during final coarse-to-fine segmentation. 

The estimation procedure of 2 and 4 above is performed sequentially at each scale. Each transition 
parameter vector 0 B is estimated concurrently with the computation of the segmentation x ln) . The 
computational overhead of this estimation procedure is greatly reduced by subsampling the image at high 
resolutions. Since the number of pixels at high resolutions is great, this subsampling does not 
substantially impact on the accuracy of the estimated parameters. 

The two-pass process implies that parameter estimation will increase computation by at least a factor 
of two. However, the additional computation required within each iteration is minimal due to 
subsampling. So the total increase in computation for parameter estimation is generally close to two. 

Begin by deriving the sequential parameter estimation procedure. The transition parameter 0„, is 
estimated by finding the maximum likelihood value given the image, y, and the previous coarse scale 
segmentation, jc <n+l) . Formally compute the solution to the optimization criteria: 

0^6 arg max P y \ x (** +!) >6- 1 ) [Eq 36] 
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where Cl is the set of valid parameter values. Using the hybrid pyramid structure of previous section, the 
log likelihood function, L, has the specific form: 


M \ 

> E i°g E 

V* = l 


[Eq 37] 


where the dependence of p on n is through the parameter 0,,. Notice that p uses the subscript^'" |jc^* u 
to emphasize that the conditional distribution assumed in [Eq 32] does not depend on s. 


This likelihood, L(d Kl ), may be maximized as a function of 0 n l by using the expectation- 
maximization (EM) algorithm (Dempster, Laird, and Rubin 1977; Bamm et al. 1970). From the form of 
the Junction pL(Q K l ) is known to be a convex function of 0„ ,. This fact shows that the EM algorithm is 
guaranteed to converge to a value of 0„ , which maximizes L. 

To apply the EM algorithm, first compute the function: 

<?«>*I.**'.,.) - E[log(P,„«0’|X < " , )P, W |,«-»(X < " ) |i < ”* 1) .9^,)) [Eq381 

| Y-y,X ( "’" 


where Xf H> is a random object. The EM algorithm uses the following iterative procedure to find a sequence 
of parameters that converge to 


c 1 e arg max <2(0^,0^) [Eq39] 

S«,i e0 

To further simplify this expression a sufficient statistic for the transition distribution p »>.«.„ (m I i,j,k,Q .) 

Xq | Xjo n, I 

of [Eq 32] must be formulated. This statistic counts the number of distinct transitions that occur from 
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a particular set of coarse scale neighbors, or* * ,) to a point xj n> . Specifically, the sufficient statistic, T, is 
defined so that: 

= 52 [Eq 42J 

/=0 h =0 


where 7 and V have the functional forms: 


and 


T Uh ( m \Uj,k) 


1 if = / and 
0 otherwise 


m,j 




= l0 8 


0 , 


^ (3/ + 2 h) + 


1-0 


«,! 


Af 


[Eq 40] 


[Eq 41] 


Substituting [Eq 40] into the expression for Q obtains the simpler expression: 


O^earg max £ 

0 % ,eQ 


2 




[Eq 42] 


where: 


r u«W - £ £[r„(x J ( " ) |4* 1> )|y=y,x'-»=i<"*'>,e ll ] 

= tEq431 
«j" Ef-1 “Pft 0 ” (*))p> ur > (* I4"* 1 ’. e^,) 


Evaluation of T'j, is computationally expensive since it requires a summation over all the points in 
S w . However, accurate estimation of 0, only requires a representative sampling of the image data. This 
is particularly true at high resolutions where the number of pixels far exceeds the number required to 
accurately estimate these parameters. Therefore, the points in S (n> at period /*"' in both the horizontal and 

vertical directions are subsampled, and is chosen so as to still have an increasing number of 
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samples at finer scales. Experimental results given in the following section will show that subsampling 
substantially reduces computation without adversely affecting the performance of parameter estimation. 


The two steps of each iteration in the EM algorithm are then: 

E - Compute T using the parameter, 0^, sampling period P tn> , and [Eq 44]. 
M - Compute 0^' € arg max e , eQ 2(0, ,,0',) using [Eq 43], 


The M step can be efficiently computed since it requires the maximization of a convex function Q 
over an interval. This may be done with, for example, the golden section search method (Press et al 
1988). The question remains as to whether the algorithm converges to the global maximum of L. 
Appendix C answers this question by adapting the convergence results of Wu (1983), and Redner and 
Walker (1984) to prove the following theorem: 

Theorem 1: If (i) SI is a compact, convex set, (ii) L(Q) and Q(0. 0’) are continuous and differentiable 
on an open set containing Q, and (Hi) L(Q) is convex, then any limit point, 0, of the sequence has 

the property that: 

0 e arg max L (0) [Eq 45] 

ecQ 

Notice that assumption (ii) does not strictly hold since Q becomes infinite at the points 0 and 1. 
In practice, this problem can be resolved by estimating 0„, over some smaller interval Q = [0+e,, 1 -£,]. 
In any case, the introduction of e, is required for numerical stability. 

Finally, the parameters ®nt) are estimated using the statistics T resulting from convergence of the 
EM algorithm. 



EL 


EL E 


T 

*= o l,h 


[Eq 46] 
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These values are then used in the second pass of fine-to-coarse computations. 

The complete SMAP segmentation algorithm with parameter estimation is summarized below: 

1. Set the initial parameter values for all «. 0 B jd= 1. and Bn.|=0.5. 

2. Compute the likelihood functions using [Eq 27] and the parameters 0 nO . 

3. Compute i (U using [Eq 34]. 

4. For scales n-L-\ to n =0 

(a) Use the EM algorithm to iteratively compute 0 n , and f. Subsample by P 1 " 1 when computing 
f, and stop when j 0^*, 1 -0jj, | <e 2 . 

(b) Compute 0 nO using [Eq 46]. 

(c) Compute x (n) using [Eq 35], 

(d) Set , =0 B l (l -10e 2 ) 

5. Repeat steps 2 through 4. 

Since global convergence of the EM algorithm is guaranteed, the choice of e, and only impacts 
on the accuracy of convergence. The values e^lO" 6 and e^lO 4 were used in this study, and were never 
found to be problematic. Generally, smaller values of e, and ^ will lead to better estimates but slower 
convergence. Also, e, should be chosen so that £,<<£ 2 . Also note that step 4d is used to accelerate 
convergence of the EM algorithm by starting the new parameter values near the previous parameter values. 
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5 EXPERIMENTAL RESULTS AND IMPLEMENTATION 


Results 

This chapter compares the performance of the SMAP algorithm with MAP segmentation using a 
MRF prior model for the pixel classes. All results of the SMAP algorithm used unsupervised estimation 
of the MSRF parameters. Unless otherwise stated, subsampling was used in all experiments, and the 
subsampling period was always chosen using the formula: 

P(n) = max 1} ^ 47 i 

This generally resulted in sufficient sampling to accurately estimate parameters while keeping the 
computational overhead of parameter estimation to a minimum. 

For this comparison to MAP estimation, a conventional 8pt neighborhood MRF model (Bouman and 
Liu 1991) was chosen for the class labels X. Specifically, the probability distribution has the form: 

p x (x) = - exp {-X (v/2 - l) (f, (*) + t 2 (x)ly/ 2)} [Eq 48] 

Z 

where X=1.5, t, is the number of horizontal and vertical neighbors of different class, and t 2 is the number 
of diagonal neighbors of different class. This appeared to yield the best overall results. 

Since the MAP estimate can not be exactly computed, an optimization method must also be chosen. 
Here the methods used were simulated annealing (SA) (Geman and Geman 1984), ICM (Besag 1986), and 
multiple resolution segmentation (MRS) (Bouman and Liu 1991). The ICM and SA methods were started 
with the maximum likelihood estimate of the segmentation. SA used a temperature schedule of the form: 

[Eq 49] 

where r<j=l, T flm (=0.2666 and A were chosen to achieve the desired number of iterations. Both 100 and 
500 iterations were used to compare the performance. After the desired number of iterations, ICM was 


n+l 


= _L + a 

T. 
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used (equivalently 7=0) to assure convergence to a local minima. These annealing parameters were 
chosen since they seemed to give the best performance over the range of test images used. In practice, 
the choice of annealing parameters must be a compromise since the optimal parameters depend on the 
specific image being processed. 

The MRS algorithm differs from ICM and SA since it effectly uses different values of A. at each 
scale. The scale dependent X is used because the MAP estimate would otherwise have unreasonable 
behavior at coarse scales (Bouman and Liu 1991). Intuitively, the MRS algorithm attempts to 
approximately correct the undesirable properties of the MAP estimator by varying the prior model. 

To test accuracy with real data, a multispectral remotely sensed image was segmented and the results 
were compared to measured ground truth to determine classification accuracy. The image used was a 927 
x 1097 three-band SPOT (Systeme Probatoire d’Observation de la Terre) image with a spatial resolution 
of 20m. Ground truth information was collected along 90 positions (transects) placed randomly 
throughout the full image (Tazik et al. 1992). Each transect is 100m long with a random orientation.’ 
Along each transect, detailed measurements of ground cover were made, but for this experiment only five 
classes were considered based on the aggregate measure of percent bare ground for each transect: 0 
percent; 1-10 percent; 11-21 percent; 21-30 percent; and 31-100 percent. Sixty of the transects were 
randomly chosen to use for training the class models, and the remaining 30 were used to test the 
segmentation algorithm performance. 

Each of the five classes was modeled as a multivariate Gaussian mixture. The mixture model is 
important because it captures the multimodal spectral behavior that is typical of most classes. The method 
for estimating the parameters of the mixture model is described in Appendix D. 

Only 100 iterations of SA were used due to the excessive computation time required to process the 
large image. The percent misclassification was also computed using the 30 testing transects for each class. 
The results are tabulated in Table 1. For each class and algorithm, the average region size was computed. 

Gasses were formed from ranges of percentage bare ground, and percent classification accuracy was 
tabulated for each class. The classification accuracy of the SMAP segmentation was substantially better 
than the maximum likelihood algorithm and slightly better than ICM. The SMAP and SA algorithms had 


Also, to increase the number of pixels associated with each transect, the transects were considered to be 60m wide, thus 
covering approximately 15 pixels on the image. 
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Table I 

Tabulated Results for the Segmentation of Multispectral SPOT Data With Ground Truth 



Class 1 

Class 2 

Class 3 

Class 4 

Class 5 

Class 

Average 

Percent Bare Ground 

0% 

1-10% 

11-21% 

21-30% 

31-100% 


Accuracy 

SMAP 

88.1% 

27.8% 

17.5% 

27.5% 

93.5% 

50.88% 


SA 100 

88.1% 

22.5% 

17.5% 

27.5% 

96.0% 

50.32% 


ICM 

87.5% 

23.2% 

17.5% 

27.5% 

93.5% 

49.84% 


ML 

78.5% 

26.5% 

17.5% 

20.0% 

85.1% 

45.52% 

Average Region 

SMAP 

131.7 

18.7 

10.7 

21.2 

57.7 

48.0 

Area 

SA 100 

126.9 

16.4 

am 

14.7 

49.9 

43.1 


ICM 

112.2 

13.2 

7.2 

12.9 

44.2 

37.9 


ML 

32.1 

3.9 

3.9 

4.7 

16.1 

12.1 


approximately comparable accuracy with SMAP performing 5.3 percent better for class 2 and SA 
performing 2.5 percent better for class 5. 

It is also interesting that the SMAP segmentation produces the largest average region sizes. This 
is because the SMAP segmentation produces fewer regions containing a small number of pixels. 

Implementation 

The multispectral SMAP segmentation algorithm and the multimodal mixed signature model have 
been implemented in the Geographical Resources Analysis Support System (GRASS) Version 4.1 
(Westervelt, Shapiro, and Gerdes 1993). The mixed signature model is generated from ground-truth data 
by the GRASS i.gensigset module, and the SMAP segmentation is performed by the GRASS i.smap 
module. 
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6 CONCLUSIONS AND RECOMMENDATIONS 


This study has derived a new criterion and model for statistical image segmentation. The SMAP 
estimator is proposed because it minimizes the expected size of the largest misclassified region, and it 
results in a computationally simple segmentation algorithm. The MSRF model uses a pyramid structure 
to capture the characteristics of image behavior at various scales. Because the MSRF has a Markov Chain 
structure, its parameters can be estimated efficiently from the image during segmentation. 

Experiments indicate that the SMAP algorithm performs comparably to or substantially better than 
MAP estimation using a Markov random field model and simulated annealing. In addition, the SMAP 
algorithm requires less computation than ICM and much less then simulated annealing. The SMAP 
algorithm was tested on multispectral SPOT data using LCTA transects ground-level data and found to 
improve segmentation accuracy. 

The single test run that was done used an image from only one date, a single sensor (SPOT), and 
a single LCTA ground-truth component, i.e., percent ground cover. Also, the linear nature of the LCTA 
transects required that the transects be artificially widened to include more pixels. This is necessary for 
statistical sampling reasons and for the spatial nature of the remotely-sensed imagery. 

It is recommended that: 

1. Further tests with LCTA ground-truth data sets using other components (e.g., plant community 
classes) should be used to test the efficacy of the segmentation model for otner ground measures. 

2. If LCTA transect information is to be used, rigorous methods should be found to “widen” the 
transects to more closely match the resolution of the imagery and to encompass more image pixels. This 
general recommendation is not isolated to the requirements of these algorithms, but is a general 
requirement for classification of remotely-sensed imagery using ground-truth data. 


3. Other physiographic regions should be used to test the applicability of these algorithms to 
other regions of the earth. 






4. Timing of the ground-troth data acquisition and the acquisition of the imagery may have 
impacts on the ability to correlate these data. At least one set of images should be acquired that relate 
very closely in time to the acquisition of the ground-troth to minimize the temporal effects due to 
differences in acquisition dates. 


5. The efficacy of these algorithms should be tested with other sensors. SPOT has 3 spectral 
bands, each with 20m spatial resolution. Sensors with more spectral resolution (e.g., Thematic Mapper 
(TM) that has 6 spectral bands, with 30m resolution) could be used for testing. Also, since the 
segmentation algorithm is contextual, higher spatial resolution imagery (e.g., airborne multispectral 
scanners) could also be tested. 
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APPENDIX A: Optimality Property for SMAP Iteration 


Assume that Jt (0 has been computed for i>n. Compute x (n) using the Markov chain structure of the 
random fields X* n> . 


L 

Jt* B) = argmax max £ 2*P(X® = x (i) i±k\Y = y) 

x®i*n k=0 

L 

= argmax max max 52 2* P(Yedy, X (0 =x®i 2 * k) 
,<«) *»/<„*«,>**=0 


= argmax max 

*(*) x (0j< n 

n 

■ J2 2*P(y6<fy,X (0 =x (0 ifc^i^n|X (0 =Jc (0 i>n)P(X (0 =i (0 i>n) 
*=o 


♦ E 

k=n*l 


2 k P{Yedy,X®=x®i>k) l 


[EqAl] 


= argmax max 52 2 k P(Y£<fy,X® =x (0 * s i <; n |x (B+1 >=i< B+l >) 

jt ( *> x (0 i<n k=0 

= arg max{ P(Yedy, X (B > = x (B) | X (B+1) =x (n+1) ) 

x ( '> 

n-1 

+ max 52 2*' B P(y e<fy,X« = xH * i sn|X< B+1 > = i< B+1 >) > 

x (l >i<n k=0 
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Next, define a residue term, R(x{*>), so that the following equality holds: 




= aigmaxP(Yedy,X ln) = x {n) |X (B+1) = i (n+1) ) (1+/?(jc (n >)) 

jf<«> 

= arg niax/^w . (■*,) _ (x w j i (B+1) , y) (1 +rt(jt (n) » 

jc<«> 


Specifically, R(xf K> ) is given by 


[Eq A2] 


- x ( \k<.iin\X<"-'> = *<”•») |C 

max —— - - -!_[Eq A3] 

*».*<« P(yerfy,X (B) = x (B) |X< B+1 > = i (B+1) ) 

Since this expression is the ratio of positive quantities, it follows that R>0. Further, R may be bounded 
from above as follows: 



n-1 

R(x^ = max T 2 k n P(X® = x®Jczizn-l \X™ = jc ( b) , Y = y ) 

jc ®, i<n k =0 
n-1 

^ max ^ 2* B P(X (B ' 1) = x (B_1) | X (b) = x {n) ,Y = y) 

*«•-» *=0 [Eq A4] 

s maxP(X (Bl) = x (B-1) |X (B > = x (B) ,X = y) 

x<-‘) 

= max/> („.,) lxW (x (B_1) | jc (n) ,y) 

*«•-*> 


Finally, 


i (,,) = argmM{log^ W|x(( ,M) i:y (x (B) |i ( " +1) ,y) + log(l + /?(x (B) )) - 1} 
= a^nuK^og/i^i^M^CxW |i (B+1) ,y)+ r(x (B) )} 


[Eq A5] 
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APPENDIX B: Product Form for Quadtree 


This appendix shows by induction that, for a quadtree based pyramid structure, the distribution of 
Y given X! 0> has the form of [Eq 23] and that the terms in the product may be computed using the 
recursion of [Eq 24]. For n=0, these relations are true by assumption. So, assuming the result for scale 
n yields: 


P(Yedy |X ( " +1) = jc<" +1 >)= £ P(Yedy |X (B) = x^p xW , C* (n) |jr (n+1 >) 

,<*) 


E(n vi^i^lfn 

Jt w lr£S<»> (reS« 


M 


= II E Py (m) |x w (Yr n) I 4 n) )P x ™ )*£*» (4 n) I ^ + I> ) 

r€S<"> x w = 1 


M 


= II n E Py? |,« (Yr n) I 4 n) )P x y |x f ( " M) (*r n) I * 

s€S iu "'> red' 1 (s) x*" 1 - 1 


(") i „(«♦!) 

s 


- n P, r .„ r (yr , ur* ,) ) 

seS ( " M > 


[Eq Bl] 


where 


P y f* *)|x<**') (y { " + 1) |^) = II E Py? |a« ft™ I OT )|xf*°< w I*) [Eq B2] 

re<T'ts) «■ l ' ' 
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APPENDIX C: Global Convergence of EM Algorithm 


This appendix proves theorem 1 by extending basic results on the convergence of the EM algorithm 
(Wu 1983, Redner and Walker 1984). 

By the stated assumptions and theorem 4.1(v) of Redner and Walker, any limit point, 0, of the 
sequence {0 P | 7 has the property that 6 e arg max 8cn 0(0,d). Since Q is convex and Q is differentiable, 
this implies that for all 6 € ft 


D 1 - 0 (2(0,0) (6-0) s; 0 [EqC,] 

where D 10 computes the gradient of Q with respect to the first argument. 

Let ft 0 be an open set containing ft such that Q and L are continuous, differentiable on ft„. Then 
define the continuous, differentiable function H( 8,0) = Q(0,0) - L(0) on ft„. It has been shown 

(Dempster, Laird, and Rubin 1977) that H has the property 6 e arg ma x Be( 1 //(0,6), which implies that 

DL(6) = D 1 ' 0 <3(6,6) +D l '°//(6,6) IEq C 2] 

= D 1 ' 0 <3(0,6) 


Therefore, for any 0 e ft. 


Z>L($)(0-0) = D 1>o <2(0,0)(0-§) ^ 0 


[Eq C3] 


Since L is a convex function and ft is a convex set, it follows that 0 is an (implied) global 
maximum of L. 
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APPENDIX D: Automatic Parameter Estimation for Mixture Signatures 


It is often desirable to segment multispectral images into regions corresponding to various 
information classes. For example, forest, grasslands, or urban areas are examples of information classes 
that may be of interest. In practice, these information classes often contain subclasses, each with 
distinctive spectral characteristics. The objective of mixture classes is to form a probabilistic class model 
composed of a number of spectral subclasses. Each subclass is then characterized by a set of parameters 
describing the mean and variation of the spectral components. 

For each mixture class, it is necessary to determine the number of subclasses and the parameters of 
each subclasses. This can be done by using a representative sample of training data from each class and 
estimating the number of subclasses and their parameters from this data. Specifically, for a given 
information class with K subclasses, the following parameters are required for the k h subclass where 
1 <k<K. 

n t - the probability that a pixel has subclass k. 

H k - the spectral mean vector for subclass k. 

R k - the spectral covariance matrix for subclass k. 

Let X), x 2 , x N be the N samples used to train the class in question. Assuming that each subclass 
has a multivariate Gaussian distribution, the probability that the pixel x„ is from subclass k is given by: 

* ^j55|A| IEqD " 

where M is the number of spectral components. 

The integer value K is known as an order parameter since it identifies the order of the model. 
Initially, assume that K is known, and then introduce a specific method for computing it. The parameters 
n k , ft k , and R t may be computed for a fixed K by using the expectation maximization (EM) algorithm 
(Baum et a], 1970; Dumpster, Laird, and Rubin 1977; Redner and Walker 1984). The EM algorithm is 
an iterative method for computing maximum likelihood (ML) estimates. The ML estimates of n k , n k , and 


46 








R k are the values of these parameters that maximize the log probability of the data x. The log of this 
probability may be exactly computed as: 


N 

log p x (x) = £ log 

Jl = l 

and is often referred to as the log likelihood. 

The EM algorithm works by clustering the data into groups corresponding to each subclass. 
However, instead of the membership to each subclass being deterministic, the membership is represented 
using a “soft” probability. The probability that pixel x H belongs to subclass k may be computed using 
Bayes rule as: 




U=i 


[Eq D2] 


/>*!* (*!*«> 


5Xi />*!*(*« I*)** 


[Eq D3] 


Then using these “soft” subclass memberships compute new spectral mean and covariance estimates for 
each subclass. These new estimates are denoted by It k , P t , and R k to distinguish them from the initial 
estimates values n k , p k and R k : 

"* * t;E Pk\x (*!-*»> 

" n = l 

** = x »Pk\x&\ x ) [EqD4] 

N n = 1 

K * E WPk\x&\ x ) 

n n = l 

The application of [Eq D2] followed by [Eq D4] improves the parameter estimates by increasing the log 
likelihood of [Eq D2]. Therefore, iterative application of these two operations will converge to a (local) 
maximum of the log likelihood. Specifically, the procedure is repeated until the change in pjx) is less 
than e where: 
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€ = —— {1 +A/ + (M * 1)M ]logAr [EqD5] 

100 { 2 J 8 

Estimation of K is difficult because ML estimation does not work for K. Intuitively, the log 
likelihood may always be increased by adding more subclasses since more subclasses may be used to more 
accurately fit the data. This problem is addressed in the work of Rissanen (1985). Rissanen suggested 
that the best model order is the one that minimizes the total number of bits required to store or transmit 
the data and parameters. Using a theoretical argument, Rissanen showed that this minimum description 
length (MDL) model can be approximately achieved by minimizing the criteria: 

MDL = - log p x (jc) + - (# of parameters) log N [Eq D6] 

2 

where the number of parameters is the total number of degrees of freedom in the parameter sets of all 
subclasses: 


(# of parameters) = Jc|l + Af + j - i [Eq D7] 

Notice that, for fixed K, minimizing the MDL criteria is equivalent to maximizing the log likelihood. 
The overall strategy for computing the mixture class parameters may be outlined as follows: 

1. Initialize the class with a large number of subclasses, K=K 0 . 

2. Apply the iterative EM algorithm to minimize the MDL criteria. 

3. Record the value of the MDL criteria, K and the subclass parameters. 

4. If the number of subclasses is greater than 1, combine the two subclasses that have minimum 
“distance,” and go back to step 2. 


5. Choose the subclass parameters corresponding to the minimum value of MDL. 







It remains to define the measure of distance between subclasses, and the initialization procedure. 
The initial subclass parameters are chosen to be: 



(i t - x n wheren = 

A 


where / is the identity matrix and [_-J is the greatest smaller integer function. The function used to 
measure distance between the subclasses is: 


d(k,J) = 



(Eq D9] 


where R tJ may be computed using 


Hj = 


n k n j 

»k + -— »i 


n k + n j 


n k * itj 


[Eq DIO] 


r kj - r~~r ( R ** * 7TT- (*/* 0*/' i*v> <•*; ‘ 

n-* + Tly 71 * ^ Kj 
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